5 research outputs found
Graph-Segmenter: Graph Transformer with Boundary-aware Attention for Semantic Segmentation
The transformer-based semantic segmentation approaches, which divide the
image into different regions by sliding windows and model the relation inside
each window, have achieved outstanding success. However, since the relation
modeling between windows was not the primary emphasis of previous work, it was
not fully utilized. To address this issue, we propose a Graph-Segmenter,
including a Graph Transformer and a Boundary-aware Attention module, which is
an effective network for simultaneously modeling the more profound relation
between windows in a global view and various pixels inside each window as a
local one, and for substantial low-cost boundary adjustment. Specifically, we
treat every window and pixel inside the window as nodes to construct graphs for
both views and devise the Graph Transformer. The introduced boundary-aware
attention module optimizes the edge information of the target objects by
modeling the relationship between the pixel on the object's edge. Extensive
experiments on three widely used semantic segmentation datasets (Cityscapes,
ADE-20k and PASCAL Context) demonstrate that our proposed network, a Graph
Transformer with Boundary-aware Attention, can achieve state-of-the-art
segmentation performance
LineMarkNet: Line Landmark Detection for Valet Parking
We aim for accurate and efficient line landmark detection for valet parking,
which is a long-standing yet unsolved problem in autonomous driving. To this
end, we present a deep line landmark detection system where we carefully design
the modules to be lightweight. Specifically, we first empirically design four
general line landmarks including three physical lines and one novel mental
line. The four line landmarks are effective for valet parking. We then develop
a deep network (LineMarkNet) to detect line landmarks from surround-view
cameras where we, via the pre-calibrated homography, fuse context from four
separate cameras into the unified bird-eye-view (BEV) space, specifically we
fuse the surroundview features and BEV features, then employ the multi-task
decoder to detect multiple line landmarks where we apply the center-based
strategy for object detection task, and design our graph transformer to enhance
the vision transformer with hierarchical level graph reasoning for semantic
segmentation task. At last, we further parameterize the detected line landmarks
(e.g., intercept-slope form) whereby a novel filtering backend incorporates
temporal and multi-view consistency to achieve smooth and stable detection.
Moreover, we annotate a large-scale dataset to validate our method.
Experimental results show that our framework achieves the enhanced performance
compared with several line detection methods and validate the multi-task
network's efficiency about the real-time line landmark detection on the
Qualcomm 820A platform while meantime keeps superior accuracy, with our deep
line landmark detection system.Comment: 29 pages, 12 figure
OCR-RTPS: An OCR-based real-time positioning system for the valet parking
Obtaining the position of ego-vehicle is a crucial prerequisite for automatic
control and path planning in the field of autonomous driving. Most existing
positioning systems rely on GPS, RTK, or wireless signals, which are arduous to
provide effective localization under weak signal conditions. This paper
proposes a real-time positioning system based on the detection of the parking
numbers as they are unique positioning marks in the parking lot scene. It does
not only can help with the positioning with open area, but also run
independently under isolation environment. The result tested on both public
datasets and self-collected dataset show that the system outperforms others in
both performances and applies in practice. In addition, the code and dataset
will release later.Comment: 25 pages, 9 figure
Surround-view Fisheye BEV-Perception for Valet Parking: Dataset, Baseline and Distortion-insensitive Multi-task Framework
Surround-view fisheye perception under valet parking scenes is fundamental
and crucial in autonomous driving. Environmental conditions in parking lots
perform differently from the common public datasets, such as imperfect light
and opacity, which substantially impacts on perception performance. Most
existing networks based on public datasets may generalize suboptimal results on
these valet parking scenes, also affected by the fisheye distortion. In this
article, we introduce a new large-scale fisheye dataset called Fisheye Parking
Dataset(FPD) to promote the research in dealing with diverse real-world
surround-view parking cases. Notably, our compiled FPD exhibits excellent
characteristics for different surround-view perception tasks. In addition, we
also propose our real-time distortion-insensitive multi-task framework Fisheye
Perception Network (FPNet), which improves the surround-view fisheye BEV
perception by enhancing the fisheye distortion operation and multi-task
lightweight designs. Extensive experiments validate the effectiveness of our
approach and the dataset's exceptional generalizability.Comment: 12 pages, 11 figure
PPD: A New Valet Parking Pedestrian Fisheye Dataset for Autonomous Driving
Pedestrian detection under valet parking scenarios is fundamental for
autonomous driving. However, the presence of pedestrians can be manifested in a
variety of ways and postures under imperfect ambient conditions, which can
adversely affect detection performance. Furthermore, models trained on
publicdatasets that include pedestrians generally provide suboptimal outcomes
for these valet parking scenarios. In this paper, wepresent the Parking
Pedestrian Dataset (PPD), a large-scale fisheye dataset to support research
dealing with real-world pedestrians, especially with occlusions and diverse
postures. PPD consists of several distinctive types of pedestrians captured
with fisheye cameras. Additionally, we present a pedestrian detection baseline
on PPD dataset, and introduce two data augmentation techniques to improve the
baseline by enhancing the diversity ofthe original dataset. Extensive
experiments validate the effectiveness of our novel data augmentation
approaches over baselinesand the dataset's exceptional generalizability.Comment: 9 pages, 6 figure